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STEADY (ROBUST) CONDITIONALLY EFFECTIVE ESTIMATION 

OF PARAMETERS 

L. S. Gurin and K. A. Tsoy 


Section 1. Formulation of the Problem /3* 

This study is a direct continuation of the works 11, 21, 
which introduced the concept of a conditionally-ef fective estima- 
tion and examined certain particular problems. Estimations which 
are optimum for a given criterion in the case of given limitations 
are called conditionally-effective . This approach is necessary 
because, among the desired properties of the estimation — along 
with properties which characterize their accuracy (independence, 
nondisplacement, effectiveness) — there are others, such as 
difficulty of the estimation algorithms, their stability with 
respect, to deviations of the laws for error distribution from the 
proposed one, etc. (see, for example, i 3 1 ) . 

With respect to the stability of the estimation, several 
studies have been devoted to this, beginning with the well-known 
study of Huber t^l. Of the more recent works, we would only like 
to mention 151. These studies, however, do not consider the diffi- 
culty of the algorithms. On the other hand, the studies (1, 2) 
consider only the difficulty of the algorithms, but do not 
consider the stability. Roth limitations are considered in this 
article for a rather simple problem of estimation. The examined _ 
method for obtaining the best estimation is sufficiently general 
and can be applied to more complex problems, and the result ob- 
tained for a specific problem has already been used in practice. 


*)Numbers in the margin indicate pagination of original foreign 
text . 
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Previously, when considering this problem, we considered 
the simplest case of the direct measurement, which is greatly 
Influenced by the action of the limitations upon the selection 
of the conditionally-effective estimation. 

Section 2. Study of the Case of the Direct Measurement /4 

Let us assume there are N measured values of the con- 
stant C, containing independent measurement errors of ^ 
distributed according to the Laplace law, i.e., 

i, «!,£».*-, Af< (1) 

It is necessary to find the conditionally-effective estimation 
of the quantity C, i.e., 

C * ^l\) j ft’ & C2) 

so that * (3 z (c^« under t.-.e condition 

that the computational time on a computer is . 

Thus, we consider only the limitation on the difficulty. 


If t is very large as compared with N, so that the limita- 
tions on the difficulty are insignificant, then, as is known 
[6], we must set and as £ we use the median of the sampl- 
ing 3C* , i.e., we use the method of the least moduli. If the 

limitation with respect to t Q is great, then it is inadequate 
to consider only the estimation methods, and we must turn to 
specific algorithms. To determine the median, we may use several 
algorithms, for example, the following. 


A^. Let us formulate a shortened variational series, i.e., we 
find the following (in increasing order) from a part of the 
sampling with the volume 




( 3 ) 
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Then 



(for odd n) or 

We use the dichotomy method described 


Cfor even n). 
in [6 ] . 


For comparison, let us examine the method of least squares, 

reduced to the algorithm A,, i.e., the determination of the 

3 * 

arithmetic mean part of the sampling with the volume n (as is 
known, this corresponds to an effective estimation for a normal 
law governing the error distribution). 


Let us use ftW to designate the computational time on a /5 
computer for the algorithm for the volume of the sampling n. 

The function depends cn the computer used, the language, 

the translator, etc. Therefore, they may be obcained by the 
method of statistical tests. In several cases, we may reach 
the conclusions analytically. Thus, let us compare the al- 
gorithms A^ and A,,. We use m^(n) to designate the mathematical 
expectation of the number of comparison operations when using 
the algorithm A^. 


We have 




where m^n) satisfies the recurrence relationship 

m z (.*) - \ m t ([?]- /) 

Let us prove that 



Let us set 


CM 

(5) 

( 6 ) 


Then we have the following from 15] 


hi < 


We may prove by induction that. 


From [P], 


(7) 

(B) 

we have 
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Let us use f.« to designate the following function: 

VCO ■ 

Then we have . _ ftl-Y- 


( 9 ) 


( 10 ) 


If , then and will be mono- 

tonically nondecreasing. But the condition fW»VS<n) gives 
(taking into account [10]) 


</!(*>< 




^ *** 


( 11 ) 


Let us consider two possibilities: a) beginning with a certain 

large n, the condition (11) is always satisfied; b) there is 
an arbitrarily large number of n, for which the condition (11) 
is not satisfied. 

In the first case, beginning with any n, fM does not de- 
crease monotonically , and consequently we have 


/6 


(12) 

i.e., beginning with a certain KlsHi* for an arbitrarily 

small • Using (8), for any *t>*t ( for which ¥(*0* & {ft) 


we obtain 




Passing to the limit at H*** , we obtain 

{ * 


( 13 ) 


(1U) 


i* 


However, we have V<»M ; consequently, 

( 15 ) 

Thus, In this case (6) Is proven. 

Let us consider the second case. If the condition (11) is 
not satisfied for ar^y n, then for ^ we have 

fa ) ; Vi ft*)- 

HrfcChl+i) 

In addition ^ at 

Thus , *ftfa) either lies below fa) and does not decrease 
monotonically , or it lies above fa) . Let us consider one of 


( 16 ) 


the values of n^. 


(ft fat) fat) ) (ft fat )• 


(17) 


If $fa\+i) & f (*)+*) > we m& y not be interested in this case. 

Thus, in addition to (17), let us assume we have 

-fta*')* ( l8 ) 

Then, beginning with the value of ft,*/, V, (») again increases. 
Let us estimate the difference ^ • It is 
clear that and VA«) , if it does not equal 

?.(".) , but equals (according to the definition of the 

function v.w; Thus, 

4 ( 11 ,)* <p(*,- 0 -<Pfat) ; VfarO-fO h)> 0 ‘ (19) 

However, 

(ffarO-<ffai)*-^—- -t 2 iffa-i ) (V 2 * Ijfct) 1 ' "i*}* ' 


We should note that, depending on the evenness of n^ the form of 


(20) changes, but we make the estimation for the worst case. 
Considering that the left part is positive, we may again discard 
the negative terms and we obtain the estimate (we should recall 
that : 


</(*,->)• If*,)* i $£„)*] 




< 

( 21 ) 


<. 1*1 < 11 

n,-( n, n,-l 


Thus, in every case 


(22) 

We thus find that in this case (6) holds. 

A comparison of (4) and (6) shows that for rather large n, 
the algorithm A^ is best, even if the noncomputational operations 
(which are basically connected with the organization of the cycles) A 
comprise a larger part in the algorithm A 2 than in the algorithm 

V 

When comparing the algorithms A^ and A^, we must consider 
that the algorithm A^ is best under the condition 

< <(*(*) f23 , 

(since the dispersions of the estimations are equal, if the 
algorithm A^ uses a sampling volume which is twice as large [6]). 

The algorithm A^ for the sampling volume 2n includes 2n-l 
additions and one division, and A 2 includes 4n comparisons for 
the sampling volume n. Considering the identical order of these 
quantities and the dependence of the computational time on several 
factors which are not considered, we may see that the final conclu- 
sion may be reached only on the basis of a numerical experiment 
with a computer. This experiment was done on the BESM-6 computer. 
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and for the corresponding sampling volumes, gave the ratio of 
the computational time of 2.8, in favor of the algorithm A^. 

Thus, although the method of least moduli gives an effective 
estimation, a conditionally-effective estimation is obtained 
with the use of an algorithm employing the method of least 
squares . 

However, if we consider the stability of the estimation, 
then the picture greatly changes. It is known that the median 
is much less sensitive to "lost" points than the arithmetic mean 
(see, for example, [5]). If we pass from the simplest problem 
to a more complex one, even to problems of linear regression 
analysis, then the picture is greatly complicated. However, the 
statements made in this section show that to obtain stable, condi- 
tionally-effective estimations, we must use combined algorithms 
which use both the method of least squares and the method of 
least moduli. Let us consider the corresponding problem. 


Section 3. Study of the Problem of Linear Regression 


/9 


We shall use the model 


y ; = <*+£ *i+ii , 





(24) 


where y^ are the measurement results, — the values of the 

independent variable, and the measurement errors ^ are inde- 
pendent and have a distribution which is similar to the normal 
distribution, more precisely, the density of the distribution 
probability ^ has the form: 

j (x.) = j 0<« i (25) 


In formula (25) 




CL 4 X 5 * ; 

)x/?a. 


( 26 ) 


7 


At a - 0, we obtain the normal law for the error distribution. 


The foilowing algorithms are considered (estimations). 


(MLS) . 


A^. The customary algorithm for the method of least squares 


Aj. The algorithm for the method of the least squares with 


preliminary grouping. Thus, N values of the independent variable 
X (we assume that the measurements are performed for equal 
values of x) form n groups with respect to m * N/n points. We 


have 




(27) 


It is clear that, instead of (2*0, we obtain the problem 

%j*e+dxj + 2l C28 , 

and the estimations £ ( , obtained by the regular MLS, may be 
used as the estimation for a and £ , i.e., 

; fc.a 

Below, we shall use t fy to designate the estimations of a 

and £ obtained using the algorithm Ay* For greater determinacy, /10 

we introduce the second index m for estimations with grouping. 

This index designates the number of points in the group, for ex- 
ample, we shall designate definite estimations as follows 


Q*jm * c / = 2. 


(29) 


A^, Combined algorithm with preliminary grouping. In this 


case, in contrast to A^, in (27) is replaced by the median 

of the corresponding group of values y, , and we shall designate 

-1 

it by . Then, instead of (28), we obtain 
and, consequently. 


( 30 ) 



2j,m * l ; 


(n) 

where £ and ^ — estimations of e and f from (30), obtained 

by the regular MLS. 


Thus, we have the^following set of estimations: 

(the second index assumes several 

values) . 


To compare the estimations, the method of statistical model- 
ing was used. We set£»0; in the model of (24). in addi- 

tion, the case of a » Oj 0.05; 0.25 was considered in (25). In 
formula (26), we have a ■ 30. In addition, at N * 15, we assume 
m ■ 3; 5 and at N • 45 m ■ 5, 5, 15. For each estimation, using 
the results of M * 100 realizations, we obtain the average com- 
putational time on the computer E (in seconds per 100 realiza- 
tions), the average displacement a» and the average dispersion 
. The results of the experiment are given in Table 1-2. 

Section 4. Conclusions and Problems of Further Research 


1. At a * 0, grouping is not advantageous, as would be ex- 
pected from theoretical considerations. 

2. For the linear model, the individual algorithms differ 

very little based on the computational time. /l 3 

3. For deviations of the normal distribution (a >0), the 
algorithm is more advantageous than Aj, since it leads to in- 
creased accuracy. Thus, for large values of a, it is better to 
use algorithms with large values of m (for great deviations from 
normal distribution, the estimation in the form of the median is 
more effective than large groups). 

Considering that in real problems we may expect smaller values 
of a, on the basis of the conclusions given above, we may recommend 
the algorithm A^ with the value t 
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TABLE 1 

QUALITY INDICES OP ESTIMATIONS AT N ■ 15 


Algorithm 

(estimation) 


AO* 1 0,0937 -j-0,0773 0,7307 
Ztr 0,0942 - 0,1103 -0,0030 
0,2335 4,9961 22,2970 
?U) | 0,0037 1 0,0789 0,3558 

l 
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TABLE 2 

QUALITY INDICES OP ESTIMATIONS AT N ■ $5 
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Or 1037 
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Tue conclusions obtained may be refined and expanded as 
the result of future research, which is presently being carried 
out in the following directions: 

1. Estimations of nonlinear models are considered. In 
this case, there must be a greater difference between individual 
algorithms in terms of difficulty. 

2. For the problem considered of linear regression, al- 

j gorlthms are additionally studied which are based on excluding 
I the lost points [7, 81. 
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